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Abstract. The topic of this paper is the typical behavior of the spectral measures of 
large random matrices drawn from several ensembles of interest, including in particular 
matrices drawn from Haar measure on the classical Lie groups, random compressions of 
random Hermitian matrices, and the so-called random sum of two independent random 
matrices. In each case, we estimate the expected Wasserstein distance from the empirical 
spectral measure to a deterministic reference measure, and prove a concentration result 
for that distance. As a consequence we obtain almost sure convergence of the empirical 
spectral measures in all cases. 



1. Introduction 

The topic of this paper is the typical behavior of the spectral measures of large random 
matrices drawn from several ensembles of interest. Specifically, we consider random matrices 
drawn from Haar measure on the classical Lie groups O(n), SO(n), U(n), §U(ro), and §p(2n); 
Dyson's circular ensembles; random compressions of random Hermitian matrices satisfying 
a concentration hypothesis (including random Wigner matrices as a special case); and a 
random matrix model considered in free probability described by the sum of two random 
Hermitian matrices, one of which has been subjected to a random basis change. In each 
case, we estimate the expected Wasserstein distance from the empirical spectral measure to 
a deterministic reference measure, and prove a concentration result for that distance. Our 
bounds are sufficient to obtain almost sure convergence of the empirical spectral measures 
(with rates in the Wasserstein distance) in all cases. 

The proofs follow the same approach as the recent work of E. Meckes [19] on random pro- 
jections of high-dimensional probability measures. The central idea is to view the Wasser- 
stein distance d± {hm , /-*) from the empirical spectral measure of a random matrix M to a 
deterministic reference measure \i as the supremum of a stochastic process indexed by the 
unit ball of the (infinite-dimensional) space Lip(C) of real-valued Lipschitz functions on 
C. Concentration properties of the random matrices considered imply that the stochastic 
process in question satisfies a subgaussian increment condition; Dudley's entropy bound 
together with approximation arguments are then used to bound the expected supremum of 
the process. In the case of the classical Lie groups, earlier work by Diaconis and Mallows 
[6], Diaconis and Shahshahani [7], and Rains [25] is used to show that the deterministic 
reference measure can be taken to be the uniform measure on the circle, and the classical 
measure concentration results of Gromov and Milman [12] are used to obtain the needed 
concentration properties. For the Hermitian models, the deterministic reference measure 
used is simply the average of the empirical spectral measure and the matrices are assumed 
at the outset to satisfy a concentration hypothesis. 

Further history and motivation are discussed in sections [2] and [3] below; the remainder of 
this section is devoted to notation and conventions. 
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For a subset 4CC, the space of Lipschitz functions / : A — > R is denoted by Lip(A), and 
is equipped with the Lipschitz seminorm I'lj^p- Denote by CP(yl) the space of all probability 
measures supported in A, and by 7 P (A) be the space of probability measures in 3> with 
finite pth moment, equipped with the L p Wasserstein distance d p defined by 

. Vp 

(1.1) dp{n, u) := inf I / \x-y\ p dn(x,y) 



The infimum above is over probability measures tt on A x A with marginals fi and v. Note 
that dp < d q when p < q. The L\ Wasserstein distance can be equivalently defined (see, 
e.g., W) by 

(1.2) d^fj,, v) := sup J [f(x) - f(y)] dfj,(x)du(y), 

where the supremum is over / in the unit ball I?(Lip(^4)) of Lip(yl). In what follows, 
"Wasserstein distance" with p unspecified refers to d\ . 

Denote by M^ a the space ofnxn Hermitian matrices, by N n the space ofnxti normal 
matrices. Denote by U(re) the group of n x n unitary matrices, by 0(n) the group of n x n 
real orthogonal matrices, by STU(ra) and §©(n) respectively the special unitary and orthonal 
groups, and by Sp(2n) C U(2n) the compact symplectic group. In all results below these 
are understood to be equipped with the Hilbert-Schmidt norm ||-||^o. For any A 6 N n , let 
[iA denote the spectral distribution of A; that is, if {Aj}" =1 are the eigenvalues of A, then 
■= £E2=i 8 Xi - 

For A S M* a , denote by 5(A) := A max (^4) — A m i n (j4) the spectral diameter of A. Note in 
particular that 

6(A) = 2 inf \\A - , 

where || -|| denotes the operator norm. 

Throughout Sections [2] and El c, C, and similar symbols denote absolute positive con- 
stants, whose exact values may vary from one instance to another. 



2. Random matrices in classical Lie groups 

This section is concerned primarily with a random matrix U drawn according to Haar 
measure from one of the classical compact Lie groups O(n), SO(n), U(n), §U(n), and 
§p(2n). It will be shown (see Corollary 12.71 below) that for fixed n, the empirical spectral 
measure \x\j is tightly concentrated near the uniform measure v on S 1 = {z G C : \z\ = 1}, 
with mean Wasserstein distance of order at most n~ 2 / 3 and subgaussian tail bounds. As a 
consequence, it is shown (see Corollary I2.8j) that the Wasserstein distance between \x\j and 
v is almost surely of order at most ra~ 2 / 3 . We do not claim that these results are sharp; 
in fact, there is reason to suspect that n -2 / 3 could be replaced by n -1 , up to logarithmic 
factors. However, to the best of our knowledge these are the first results which achieve any 
bounds for these quantities. 

Random matrices from these groups have been extensively studied, and much is already 
known. In particular, we use results from [6], [7], and [25) below in order to show that 
the uniform distribution on the circle is the correct reference measure for these ensembles. 
In the case of the unitary and special unitary groups U(n) and SU(n), large deviations 
principles for the empirical spectral measures have been proved by Hiai and Petz [14] and 
Hiai, Petz, and Ueda [15], respectively. The rates in those LDPs are consistent with the level 
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of concentration we obtain for the distance, and both results imply in particular the almost 
sure convergence of the spectral measures, although the LDPs do not give information about 
the rates of convergence. It should be noted that almost sure convergence for random unitary 
matrices was proved prior to the results of Hiai and Petz in Voiculescu's paper [30]. As 
far as we know, almost sure convergence for the spectral distributions of matrices from the 
other groups above was not previously known. 

The approach taken in this section has three main steps: 

(1) The mean ESD fx = Efijj approximates v in Wasserstein distance (Theorem 12. ip . 
This is shown using known moments of \i and classical results on approximating 
Lipschitz functions on S 1 by polynomials. 

(2) The mean Wasserstein distance Ed\ (fijj , n) is small (Theorem I2.6D . Using defini- 
tion (II. 2p . the Wasserstein distance is interpreted as the supremum of a stochastic 
process indexed by test functions. Concentration of measure on the classical Lie 
groups implies that this process has subgaussian increments, allowing the expected 
supremum to be estimated via entropy methods. 

(3) The Wasserstein distance di(fiu, l/ ) is tightly concentrated near its mean (Theorem 
12. 7p . and almost sure convergence of fijj — with the indicated rate in Wasserstein 
distance — follows from the Borel-Cantelli lemma (Corollary 12. 8p . This concentra- 
tion is again shown using concentration of measure on the classical Lie groups. 

In contrast to the proofs of the LDPs in [141 115] . the proofs here make no use of the joint 
densities of eigenvalues in the classical Lie groups. 

There is an important technical caveat to the strategy outlined above, which is that the 
general concentration of measure results known for SO(n), SU(n), and Sp(2n) do not extend 
to O(n) and U(n). The latter two cases will instead be handled basically by reducing to 
the corresponding special groups. For this purpose it will be useful also to consider Haar 
measure on the coset SO~(n) = {U G O(n) : det U = —1}- (In this case Haar measure 
refers to invariance under the action of SO(n).) 

The same strategy can also be carried out for random matrices from Dyson's Circular 
Ensembles, as indicated in Theorem 12.91 

The first step of the plan of this section is achieved in the following theorem. Here and 
in the following, U £ G means that U is distributed according to Haar measure on the 
group (or coset) G. Recall that u denotes the uniform probability measure on S 1 , and that 

/i = EpLu- 

Theorem 2.1. (1) If U G U(n) then fi = u. 

(2) IfUe SU(ra) then di(ji,u) < %. 

(3) IfUeSQ(n), O(n), SO~(n), or§p(2n), then di(n,u) < C^. 

Proof. (1) For any fixed u G S 1 , ujU is also Haar-distributed in U(n). Therefore fj, is a 
rotation-invariant probability measure on § , hence equal to v. 
(2) Observe first that e 2wi / n I n G SU(n), and so e 2lTi l n U is Haar-distributed in SU(n). 
Thus for any integer k, 

KtrU k = Etr(e 27Ti/n U) k = e 2nik/n Etr U k . 
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Therefore EtrU k = for 1 < \k\ < n. If g(z) = Elfckn 

a^z is a trigonometric 

polynomial on § , it follows that 

J g dji = E J g dfijj = — a^E tr U k = ao = / g dv. 

\k\<n 

Now given / : S 1 — > R which is 1-Lipschitz, Jackson's theorem (see, e.g. [2§1 Theorem 
1.4]) implies that there is such a polynomial g such that ||/ — gW^ < ^- Thus 



/ cfyt - / / dv 



< 



+ 



g dv - if dv 



f dfi- J g dfi 
on 

<2||/- 9 IL<- 

(3) By results of Diaconis and Mallows (see [6]), Diaconis and Shahshahani [7], and 
Rains [25], in each of these cases |Etr U k \ < 1 for 1 < \k\ < n. 

Given / : S 1 — > IR which is 1-Lipschitz, it is easy to check that 

\k\ > 1 (see, e.g., Theorem 4.6 of [IT]). If 

n-l 
fc=-(n-l) 

then 



/(*) 



< f for 



S n dji- I S n dv 



- V f(k)EtrU* 



< 



C y l< c logn 

Kk<n-1 



l<|fe|<n-l 

A theorem of Lebesgue (see, e.g., |26| Theorem 2.2]) implies that 
\\f - SnW^^Cilog^miWf - g^, 

where the infimum is over all trigonometric polynomials g(z) = 5Z|fe|<n a k zk ■ Com- 
bined with Jackson's theorem \2Q\ Theorem 1.4] this implies that \\f — S , 7J ,|| 00 < 
^-,io£n an( ^ fang 



f dfi- f dv 



< 

+ 
< C 



f dfi- J S n dfi 
fdv 



S n dfi 



dv 



S n dv 
log n 



□ 



n 



The second and third steps of the plan of this section rely on the following concentration 
of measure property. This essentially follows from a general isoperimetric inequality for 
Riemannian manifolds due to Gromov and Milman [12] and calculations of the Ricci curva- 
ture of the classical Lie groups (for which see [lj Appendix F]). In the precise form stated 
it follows from a result of Bakry and Emery [2] which shows that the same Ricci curvature 
bounds imply a logarithmic Sobolev inequality, which in turn implies such a concentration 
inequality (cf. [HI Chapter 5]). 
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Proposition 2.2 (See [H Theorem 4.4.27]). Let G be one o/SO(n), SQr(ra), SU(n), or 
Sp(2n). Let F : G — >■ R 6e 1-Lipschitz with respect to the geodesic metric (induced by the 
standard embedding in matrix space with the Hilbert-Schmidt norm). If U £ G, then 

F[F{U) - EF(U) >t]< e~ cnt2 

for every t > 0. 

The geodesic metric on G dominates the Hilbert-Schmidt metric on matrix space, so the 
conclusion of Proposition 12.21 applies in particular to F which is 1-Lipschitz with respect to 
the Hilbert-Schmidt metric. 

The following lemma provides the necessary Lipschitz estimates for the functions to which 
the concentration property will be applied in this and the subsequent section. 

Lemma 2.3. The map A i— > pa from J\ n to IPi(C) taking a normal matrix to its spectral 
measure is n~ 1 / 2 -Lipschitz. Furthermore, if p G O'i(C) is any fixed probability measure, the 
following statements hold. 

(1) For any 1-Lipschitz function f : C — > R, the function 

J f dp: a- J f dp 

is n 1 / 2 -Lipschitz. 

(2) The map A i— >■ d\(pA, p) is n 1 / 2 -Lipschitz. 

Proof. If A and B are n x n normal matrices, then the Hoffman-Wielandt inequality [3j 
Theorem VI. 4.1] implies that 

n 

(2.1) min \*M) ~ K {J ){B)\ 2 < \\A - B\\ 2 HS , 

J=l 

where Xi(A), . . . , A n (^4) and Xi(B), . . . , X n (B) are the eigenvalues (with multiplicity, in any 
order) of A and B respectively. Defining couplings of pa and ps given by 

1 n 

for a £ S n , it follows from (jl.ip and (|2.ip that 



. '(A,(^),A CT(J) (B)) 



di(p A ,PB) 2 < d 2 {pA,PB) 2 < min / |to - z| 2 dir a (w,z) 

ueSn J 

1 n 1 

= min - \^M) ~ ^U)( B )\ 2 < ~ W A ~ B\ 



2 



proving the first statement of the lemma. The final claim that A i— )• di(pA,p) is n" 1 / 2 - 
Lipschitz is now immediate. 

By the definition in (jl.2p of di, given a 1-Lipschitz / : C — > R, the mapping J'i(C) — > R, 
p ^ f f dp is 1-Lipschitz. Combined with the above argument, this implies that the 
function 

A^ J f dp a- j f dp 
is n _1//2 -Lipschitz on N n . □ 
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Corollary 2.4. Let G be one o/SO(n), SO~(n), SU(n), or §p(2n) ; and let U € G. 

(1) For any fixed probability measure p G 5 5 (S 1 ) and 1-Lipschitz f : S 1 — > R, define the 
random variable 

X f = J f dp v - J f dp. 



Then 



\X f - EX f | > t] < 2e" cri2 * 2 



/or every t > 0. 

For any fixed probability measure p G J > (S 1 ), 

PfdiO^p) -EdiOit/.p) > t] < e~ cnH \ 

for every t > 0. 

Proof. The first part of the corollary follows from Proposition 12.21 and part (pQ) of Lemma 
(applied to both Xf and — Xf = X_f). 
The second part of the corollary follows from Proposition 12.21 and part ([2]) of Lemma 

□ 

As noted earlier, the strategy outlined above does not apply directly to the full unitary 
and orthogonal groups, due to the lack of the concentration property of Proposition 12.21 
The results of Gromov-Milman and Bakry-Emery fail to apply to O(n) because it is not 
connected, and to U(n) because its Ricci tensor is degenerate. Nevertheless, the main results 
of this section can be extended to U(n) and O(n). In the orthogonal case this will be done 
by conditioning on det U, which is why it is convenient to consider also the case of random 
matrices in SO~(n). The unitary case could be handled in a similar way, but can also be 
deduced immediately from the special unitary case via the following lemma. 

Lemma 2.5. If U £ U(n) and V G SU(n), then d\ v) and d\(py,v) are identically 
distributed. 

Proof. Define a coupling of U and V as follows. Let V G §U(n) be Haar-distributed, and 
let u) G S 1 be uniformly distributed independently of V. Define U = ujV. 

Now given any fixed W G U(n), W = £Y for some £ G S 1 and Y G SU(n), and thus 

UW = (loQ(VY) 

and 

WU = (£u))(YV) 

both have the same distribution as ojV = U. Therefore U G U(n) is Haar-distributed. 
It follows that 

di(pu,v) = di(p, u v,v) = di{n v ,v) 
since \i^y is a translation (in S 1 ) of \xy and v is translation- invariant. □ 

An analogous statement to Lemma 12.51 holds for O(n) and SO(n) when n is odd; in that 
case uj,£ G { — 1, 1} in the proof above. When n is even, —I n G SO(n) and so the argument 
breaks down, requiring a different approach to deducing the main results for O(n). 

The next result carries out the second step in the plan of this section. 

Theorem 2.6. Let G be one o/0(n), SO(n), §<Dr(n), U(n), SU(ra), or Sp(2n), and let 
U eG. Then 

(2.2) Edi(/i[7,i/) < Cn~ 2/3 . 
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Proof. Assume for now that G is one of SO(n), SO - (re), SU(n), or §p(2n). 

Let Lip (S 1 ) = {/ G Lip(S 1 ) : f(l) = 0}, and observe that the Lipschitz seminorm |-| Lip 
is a norm on this space; denote by J B(Lip (S 1 )) its unit ball. For / : S 1 ->• R, define the 
random variable 

x f = if d^u - f dfj,. 



Note that EX* = for every /. Since the value of Xf is unchanged by adding a constant 
to /, by (JO]), 

d^UiV) = supjXy I / G ^(LipotS 1 ))}. 

Fix m G N, to be determined later, and let Lip™(S 1 ) be the (m — l)-dimensional subspace 
of Lip (S 1 ) consisting of functions which, when interpreted instead as 2-7r-periodic functions 
on M, are affine on each subinterval [ 2 Zp W , ^m~\ ^ or ^ ^ ^- Given / G i?(Lip (S 1 )), there 
is a unique g G ^(LipQ^S 1 )) such that #(exp(i^)) = /(exp(i^)) for every k. Then 
II/- Slloo so that 

Xf-Xg < — 

m 

almost surely. It follows that 

2"7T 

(2.3) di(/itf,A0 < snp{X g | 5 G ^(Lip^S 1 ))} + — . 
By Corollary E31 for g,h G ^(Lip^S 1 )), 

P[|X ff - X h | > i] = P[|X fl _ h | > t] < 2e- cn2t2/l9 - h ^p 
for every t > 0. Thus by Dudley's entropy bound [8], 

r* 1"°° i 

(2.4) Esup{X 9 | g G ^Lip^S 1 ))} < - / */logJ\r(B(Lipg l (S 1 )), |-| Lip ,e) de, 

n Jo 

where A r (i?(Lip™(S 1 ), |-| L i P , e) denotes the minimum number of £-balls with respect to | " | Lip 
needed to cover i?(Lip™(S 1 )). (For a very neat exposition of Dudley's bound, see Section 
1.2 of [2S]-) Since ^(Lipg^S 1 )) is itself a ball with respect to the norm |-| Lip , there is the 
standard volumetric estimate \24\ Lemma 2.6] 

771—1 



< 



Inserting this into (|2.4p and then inserting the resulting estimate into (|2,3p yields 

Edt(nu 

Picking m of the order n 2//3 yields that 



Ed 1 (w,n)<C^ + -. 

n m 



C 

Edi(fj,u,fj,) < 



n 2/3 ' 

and so the theorem (except for the cases of O(n) and U(n)) follows by Theorem 12.11 and the 
triangle inequality for d\. 

If G = U(n), then the theorem follows from Lemma 12.51 and the case of SU(n). 

If G = O(re), then conditionally on det U, U is Haar-distributed in either SO(n) or 
SCT(n). Since 

Edi(jiu,v) = E(E[d 1 (ji U ,u) | det 17]), 
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the theorem follows from the cases of SO(n) and SO (n). 



□ 



A direct union bound argument can also be used in place of Dudley's theorem in the proof 
of Theorem 12.61 but the argument given above is considerably more elegant and concise. 

The next two results complete the plan of this section. 

Corollary 2.7. Let G be one ofO(n), SO(n), SO~(n), U(n), SU(ra), or Sp(2n), and let 
U eG. Then 



di{^u,u) > Cn~ 2/3 +t 



< e 



for every t > 0. 



Proof. This follows immediately from Proposition 12.21 and Theorem 12.61 except in the cases 
of O(n) and U(n). If G = U(n), the corollary follows from Lemma 12.51 and the case of 
SU(n). If G = O(n), then 



di((JLu,v) > Cn~ 2/3 +t 



E 



d\{nu,v) > Cn~ 2/3 + t 



detU 



and the corollary follows from the cases of SO(n) and SO (n). □ 

Corollary 2.8. For each n let G n be one ofO(n), SO(ra), SO~(n), U(n), SU(n), or Sp(2n), 
and let U n £ G n . Then with probability 1, 

dx{nu n ,v) <Cn- 2 ' z 

for all sufficiently large n. 

Proof. Let t = n~ 2 l 3 in Corollary 12.71 and apply the Borel-Cantelli lemma. □ 

The main results of this section can all be extended to Dyson's circular ensembles (for 
extensive discussion, see [23]), by a slight variation of the same methods. The Circular 
Unitary Ensemble CUE(n) is the same as the Haar distribution on U(n). The Circular 
Orthogonal Ensemble COE(re) is distributed as V T V, where V is Haar-distributed in U(n). 
The Circular Symplectic Ensemble CSE(2n) is distributed as JV T J T V, where V is Haar- 
distributed in U(2n) and 

^0 -1 
1 



J 



-1 

1 



-1 

1 

Theorem 2.9. Let U be drawn from COE(n) or CSE(2n). Then 

Efiu = v, 

C 



Edi(nu, v) < 



n 



2/3 : 



and 



for every t > 0. 



di((J.u,v) > Cn~ 2/3 + t 



< e 



-cn 2 t 2 
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If, for each n, U n is drawn from COE(n) or CSE(2n), then with probability 1, 
for all sufficiently large n. 

Proof. For brevity the proof is given only in the case of the COE, the argument for the CSE 
being nearly identical. 

Let U = V T V, where V G U(n) is Haar-distributed, and fix e ie G S 1 . Then e ie / 2 V is 
also Haar-distributed in U(n), so U has the same distribution as (e l0 / 2 V) T \e ie / 2 V) = e ie U. 
Therefore E^j/ is a rotation-invariant probability measure on S , and is hence equal to v. 

Next, arguing as in the proof of Lemma l2.5[ U has the same distribution as {ujW t ){ujW) = 
u 2 W T W, where W G SU(n) is Haar-distributed and oj G S 1 is uniformly distributed inde- 
pendently of W . Since to 2 is distributed as to, U has the same distribution as uiW T W. As in 
the proof of Lemma [2. 5 \ it follows that di((ijj, v) has the same distribution as di(p W Tyy , is). 

Now given W\ , W 2 G SU(n), 

llWfWi - W?W 2 \\ „ q < \\W?(Wi - W 2 )\\ Hq + \\(W? - W 2 T )W 2 \ 



= ||Wi " ^ 2 |Ih5 + \\ W I - W I\\ HS = 2 H Wl " ^11^5 • 

Thus the map §U(n) — >• SU(n) given by W 1— > W T W is 2-Lipschitz, and so by Proposition 



¥[F(W T W) - EF(W T W) >t]< e~ cnt 

for every t > and every 1-Lipschitz function F : SU(n) — >• R. 

The remainder of the proof is the same as the proofs of Theorem 12.61 Corollary I2.7[ and 
Corollary □ 



3. Some random Hermitian matrices 

In this section, we prove results comparable to Theorem 12.61 and Corollaries 12.71 and 
12.81 for two models of Hermitian random matrices. An essential condition on some of the 
random matrices used in the constructions below is the following. 

Let A be a random n x n Hermitian matrix. Suppose that for some C, c > 0, 

(3.1) P[|F(A) - KF(A)\ >t]< Cexp[-cnt 2 ] 

for every t > and F : M* a — > R which is 1-Lipschitz with respect to the Hilbert-Schmidt 
norm. Examples in which this condition is satisfied include: 

(1) The diagonal and upper-diagonal entries of M are independent and each satisfy 
a quadratic transportation cost inequality with constant cj^Jn. This is slightly 
more general than assuming a log-Sobolev inequality (see |18| Section 6.2]), and is 
essentially the most general condition with independent entries (see [II]). It holds, 
e.g., for Gaussian entries and, more generally, for entries with densities of the form 

e -nuij(x) w h ere U 'lj(x) > C> 0. 

(2) The distribution of M itself has a density proportional to e - ntru ( M ) with u : R — > R 
such that u"(x) > c > 0. This is a subclass of the so-called unitarily invariant 
ensembles, studied extensively in mathematical physics (see [5]). The hypothesis 
on u, via the Bakry-Emery theorem, guarantees that M satisfies a log-Sobolev 
inequality; cf. [U Proposition 4.4.26]. 
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One could also consider the situation in which (|3.ip is only assumed to hold for convex 
Lipschitz functions F. By Talagrand's theorem (see e.g. \18\ Section 4.2]), this is the case 
if the diagonal and upper-diagonal entries of M are independent and supported in sets of 
diameter at most c/y/n. Under this weaker condition, the arguments below can be applied 
to prove results analogous to Theorem 12.61 and Corollaries 12.71 and 12.81 n °t f° r rfi(/-tA/>E/iA/) 
but for a "convex- Wasserstein distance" defined by 

fdfj,- / fdu 



d\ c (n, v) := sup 

/ 6 B(Lip(R)), 
/ convex 

This distance is also a metric for weak convergence of laws (see, e.g., the proof of |21[ 
Theorem 2]). 

The first model of random Hermitian matrix considered in this section is the following. 
Let U G U(n) distributed according to Haar measure, independent of A, and let Pk denote 
the projection of M n onto the span of the first k basis elements. Define a random matrix 
M by 

(3.2) M := PkUAU*P£. 

Then M is a compression of A (as an operator on W 1 ) to a random fc-dimensional subspace 
chosen independently of A. In the case that {A n } n£ fq is a deterministic sequence of matrices 
with a limiting spectral distribution and — — > a, the limiting spectral distribution of M 
can be determined using techniques of free probability (see |28|); the limit is given by a 
free-convolution power related to the limiting spectral distribution of A n and the value a. 
The concentration properties of the spectral distribution of M for A deterministic were 
treated in [20], and the results below improve on those appearing in that paper. 

In the case that k = n, the empirical spectral measure fiM of M is the same as {xa\ in 
particular, if A satisfies a log-Sobolev inequality and k = n, then the results below on the 
concentration of fj,M about its mean improve on the comparable results of Guionnet and 
Zeitouni from |13j . both in terms of the specific bounds and in the metric used. (The metric 
used in [13], although referred to there as Wasserstein, is more commonly referred to as the 
bounded-Lipschitz distance and metrizes a slightly weaker topology than the metric used 
here.) We show below that the expected Wasserstein distance of [iu to E//m is of order 
n -2 / 3 , whereas what follows from the results of [13] is that the expected bounded-Lipschitz 
distance of /xm to E/xm is of order n -2 / 5 . 

In the further special case that the entries on and above the diagonal are assumed to 
be independent, the results below have been surpassed (in Kolmogorov distance) in the 
very recent work of Gotze and Tikhomirov [10], who proved for such matrices that the 
Kolmogorov distance between the empirical spectral distribution and the semicircular law 
is almost surely of order 0(n -1 log 6 n) with some positive constant b > 0, under mild 
conditions on the distributions of the entries. 

The proofs below follow the same approach as described in the final two steps of the 
outline given in Section [2j Namely, measure concentration, both on U(n) and from the 
hypothesis of (|3.ip ). is used together with entropy methods to show that Edi //) is 
small, and moreover that d\(fiM,fi) is strongly concentrated near its mean. Here, [iu is 
again the empirical spectral measure of M and /i = E/xm; in this section, [i is always used 
as a reference measure. An additional truncation argument will be necessary, since the 
support of /j,m is not necessarily uniformly bounded in this context. 
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The following lemma is proved using a standard discretization argument. 

Lemma 3.1 (cf. \22, Proof of Proposition 4]). Suppose that ||E^4|| < C and A satisfies 
(|3.ip for every convex 1-Lipschitz function F : M* a — > K. Then there is a constant K 
depending only on C, c, C such that 

nA\\ op <K. 

Observe that it follows from Lemma 13. II that E ||M|| < K for M defined in (|3.2p . 
The next preliminary lemma and corollary are needed to obtain concentration properties 
for M from those of A and U. 

Lemma 3.2. Let A G M s n a be fixed. The map U(n) -> M s n a , U H> UAU* is 5(A)-Lipschitz. 
Proof For A G R, let A x = A - XI. For any U, V G U(n), 
\\UAU* - VAV*\\ HS = \\UA X U* - VA X V*\\ HS 

= \\UA X (U - V)* + (U- V)A X V*\\ HS 

< \\ua x (u - vy\\ HS + \\(U- V)A X V*\\ HS 

< \\ua x \\ op \\(u - vy\\ HS + \\u - v\\ HS p A nU 

= 2\\A x \\ op \\U-V\\ HS . 

Here we have used the facts that 

(1) \\U\\ op = l for UGV(n), 

(2) ||AB|| op < \\A\\ op \\B\\ op for A, Be M n , and 

(3) \\AB\\ HS < \\A\\ op \\B\\ HS for A, B G M n . 

Recalling that 6(A) = inf x \\A X \\ , optimizing over A proves the lemma. □ 

In [20] a weaker result is proved, essentially using instead of the third fact above the 
weaker estimate ||AB||jjg < ||^4||#s 

Corollary 3.3. Let A G M* a be fixed and let 1 < k < n. Then the map U(n) — > M. s k a given 
byU^ P k UAU*Pl is 5(A)-Lipschitz. 

Proof. Combine the Lemma 13.21 with the obvious fact that A \- > P k AP k is 1-Lipschitz 
M* a ->■ M s k a (since PkAPj* is just a submatrix of A). □ 

Theorem 3.4. Suppose that A satisfies (|3.ip for every 1-Lipschitz function F : M* a — >• R. 

(1) IfF: M s n a ->Ris 1-Lipschitz, then for M = P k UAU*P^, 

P[\F(M) -EF(M)\ >t]< Cexp[-cnt 2 ] 

for every t > 0. 

(2) In particular, 

P[|||Af||^-E||M|Lj >t] <Cexp[-cnt 2 ] 

for every t > 0. 

(3) For any fixed probability measure ^ G ^(C) and 1-Lipschitz f : M — > M, 



X f = J f d ^ M ~ J f 

t/ierz 

PflXf-EXfl >t] <Ce- cfcn * : 

/or every t > 0. 
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(4) For any fixed probability measure fx € ^(C) and 1 < p < 2, 
P [I dp (Mm,/*) " Ed p (MM,/x)| > f] < Ce" cfcnt2 

for every t > 0. 
Proof. For the first part, observe that 



P[|F(M) - EF(M)| > t] < E 



|F(M)-E[F(M)|E7]|>- 



U 



+ 



\E[F(M)\U] -EF(M)| > - 



Conditional on {7, F(M) is a 1-Lipschitz function of A, and by taking expectation over 
U in Corollary E31 it follows that E [F(M)\U] is an E[<5(A)]-Lipschitz function of U. The 
first part thus follows from the hypothesis on A and Lemma 13.11 Part (2) follows from 
part (1) and the fact that the operator norm is a 1-Lipschitz function with respect to the 
Hilbert-Schmidt norm on M* a . The remaining parts follow from Lemma 12.31 and part 
(1). □ 

To estimate Edi(/iM , /■*) (where, as before, [i = E^a/) the arguments in the previous 
section can be supplemented with a truncation argument using the lemma above to obtain 
the following. 



Theorem 3.5. Suppose that A satisfies (13. ip for every 1-Lipschitz function F : M^ a — > M.. 
Let M = PfJJ AU* , and let \xm denote the empirical spectral distribution of M with 
H = E^m- Then 



Edi(/j,M , < 



C"(E\\M\ 



op I 



a/3 



and so 



di(A*M,/t) > 



C" 



< 



C" 



+ t 



< Ce 



-cknt 2 



(kn) 1 / 3 
for each t > 0. 

Proof. Denote by Lip (E) = {/ G Lip(R) : /(0) = 0}, and observe that 

Edi(Ato,/i) =Esup{X / : / G B(Lip (R))}, 

where 



/ dpL M - f dn 



as before. The indexing space can be reduced to compactly supported functions via a 
truncation argument, as follows. Fix R > 0, and let 



fn(x) 



'f(x) if \x\ < R; 

f(R) + [sgn(/(J?))] (R-x) iiR<x<R + \f(R)\) 

f(-R) + [sgn(/(-i?))] (x -R) if - \f(—R)\ -R<x<-R; 

if x < -R- \f(-R)\] or x > R+ \f{R)\ ; 
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that is, fn = f for \x\ < R and then drops off linearly to zero, so that fn is 1-Lipschitz, 
f(x) = for |x| > 2R, and \f(x) — /r(x)\ < \x\ for all i£R, Then by Fubini's theorem, 



/ dfj, 



M 



M 



< 

'\x\>2R 

< 2R 



\x\ dfj, M (x) 



[•OO 

d/j, M {x)+ //Af((t,oo)) dt + 

x\>2R J2R 



-2R 



fJL M {{-00,t)) dt. 



Taking the supremum over / followed by expectation over M, and making use of part (2) 
of Theorem E31 together with the trivial bound E^ M ((-oo,t) U (i,oo)) < raP[||M|| > t] 
yields 



Esup 



(/ - fa) dfi 



: / G 5(Lip c 



<ORnexp -cn(2R-E\\M\ 



op I 



E||M|| op + 1 gives 



and the same holds if {im is replaced by [i. Taking, for example, 2R 
that 

Esup{\X f -X fR \ : / e B(Li Po (M))} < Cn{E\\M\\ op ) e - cn . 
Consider therefore the process Xf indexed by Lip 1 1(e||jw|| +i) (with norm |-| Lip ), where 

Li Pa>b := {/ : K -> R : |/| Lip < a; /(x) = if |x| > b} . 
The above argument shows that 



(3.3) 



E 



di(nM,n) 



< E 



sup\X f : f e Li Pl i 



(E||A/|| 



+ Cn(E||M|| op )e 



Now that the indexing space of the process has been reduced to compactly supported 
functions, the proof can be completed exactly as in the case of Theorem 12 .6t the additional 
error incurred by the truncation above is negligible compared to the errors produced by 
the earlier argument. The factor (E ||M|| op ) 1//3 in the final bound is due to the size of 
the truncation parameter R (in the proof of Theorem 12. 6[ the corresponding quantity was 
simply 2tt and therefore disappeared into the constants in the statement). □ 

Corollary 3.6. For each n, let A n G M* a be fixed with spectrum bounded independently of 
n. Let U n G U(n) be Haar- distributed and fix k. Let M n = P^U A n U* P£ and let fi n = E/j,m„- 
Then with probability 1, 

di{HM n ,Vn) < Cn~ 1/?J , 
where C depends only on k and the bounds on the sizes of the spectra of A n . 

Proof. This follows from Theorem 13.51 using t = n -1 / 3 and the Borell-Cantelli lemma. □ 



The second model of random matrix considered in this section is is defined as follows. 
Let A,B G satisfy condition (|3~T]l let U G U(n) be Haar distributed, with A,B,U 
independent. Define 

M = UAU* + B, 

the "randomized sum" of A and B. In the case of deterministic sequences {A n } and {B n }, 
this model has been studied at some length. The limiting spectral measure was studied 
first by Voiculescu [30] and Speicher [27], who showed that if {A n } and {B n } have limiting 
eigenvalue distributions fiA and \ib respectively, and if M n := UA n U* + B n , then the 
limiting spectral distribution of M n is given by the free convolution /U^EB/xb- More recently, 
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Chatterjee [I] showed subexponential concentration (up to a logarithmic factor) of p,M n 
about its mean; Kargin [16] improved this to subgaussian concentration (again up to a 
logarithmic factor), and was furthermore able to consider the distance to HA n EB fj,B„ itself, 
rather than K^M n - Theorem 13.81 below gives a similar level of concentration to Kargin's 
result. The main differences are that here the reference measure is ^PM n rather than a free 
convolution; the matrices A n and B n may be random here, whereas Kargin's result requires 
A n and B n to be deterministic; and Kargin's result is in terms of Kolmogorov distance, 
rather than Wasserstein distance. 

The proofs below once again follow the same approach as described in the final two steps 
of the outline given in Section [2 

Note that by Weyl's inequalities [31 Theorem III.2.1], the spectrum of M always lies 
in the interval [A min (A) + A min (i?), A max (A) + X max (B)], of length 6(A) + 5(B), and so by 
Lemma 13.11 E ||M || is bounded in terms of the constants in (|3,ip for A and B. We also 
have the following analog of Theorem 13.41 

Theorem 3.7 (cf. [Tj Corollary 4.4.30]). Let A, B G M s n a satisfying (J37TJ) and let U G U(n) 
be Haar- distributed with A, B, U independent. Define M = UAU* + B. 

(1) There exist C,c depending only on the constants in (I3.ip for A and B, such that if 
F : M s k a -)• R is 1-Lipschitz, then 



¥[\F(M) - EF(M)\ >t]< Cexp [-cnt 2 ] 



for every t > 0. 
(2) In particular, 



P[ \\M 



op 



E ||M || >t)< Cexp [-cnt 2 ] 



for every t > 0. 

(3) For any fixed probability measure p G ^(R) and 1-Lipschitz f : R — > R, let 




Then 



¥[\X f - EX/| >t] < Cexp [-cnH 2 ] 



for every t > 0. 

(4) For any fixed probability measure p G 3 5 2 (^) anc ^ 1 ^ P < 2, 



P[|dp(/i A /,/i) -Ed p (^ M ,Ai)| > t] < Cexp [-cn 2 t 2 ] 



/or every t > 0. 



Proof. 



(1) By the coupling described in the proof of Lemma 12.51 we may equivalently 



define 



M = (uV)A(uV)* + B = VAV* + B 
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for 10 and V independent with lo uniformly distributed in S 1 and V Haar-distributed 
inSU(n). Now, 



>[\F(M) - EF(M)\ > t] <E 



\F(M) -E[F(M)\A,V]\ > 



A,V 



+ E 
+ P 



\E[F(M)\A,V] -E[F(M)\V}\ > - 

3 



\E[F(M)\V] -EF(M)\ > - 

3 



V 



Conditional on A and V, F(M) is a 1-Lipschitz function of B, and by independence, 
the distribution of B is unchanged by conditioning on A and V . The conditional 
distribution of B therefore still satisfies the concentration hypothesis and so the first 
summand above is bounded as desired. Similarly, conditional on V, E [F(M)\A,V] 
is a 1-Lipschitz function of A, and the bound on the second summand follows from 
independence and the concentration hypothesis for A. By Corollary 13. 3[ M is 5(A)- 
Lipschitz as a function of V; it follows that E[.F(M)|V] is an E[<5(A)]-Lipschitz 
function of V, and the claim then follows from Lemma 13. II and Proposition 12.21 

(2) This follows from the previous part and the fact that the operator norm is a 1- 
Lipschitz function with respect to the Hilbert-Schmidt norm on M^ a . 

(3) As a function of hm 6 ^(KO, Xf is 1-Lipschitz by the duality between d\ and 
1-Lipschitz functions on M. By Lemma 12.31 is n~ 1//2 -Lipschitz as a function of 
M, and so the claim follows from the first part. 

(4) This also follows from the first part and Lemma 12.31 

□ 



Theorem 3.8. In the setting of Theorem 3.7, there are constants c,C,C ,C" depending 
only on the concentration hypotheses for A and B, such that 

C(E||M|I )V3 c , 



n 



2/3 



n 



2/3' 



and so 



dx(iiM,n) > 



a 

2/3 



+ t 



n 



< C"e 



for t > 0. 

The proof is exactly the same as the proof of Theorem 13.51 

Corollary 3.9. For each n, let A n ,B n £ M* a be fixed matrices with spectra bounded in- 
dependently of n. Let U n G U(n) be Haar-distributed. Let M n = UA n U* + B n and let 
fj, n = EfiM n - Then with probability 1, 

di(nM n ,Hn) < Cn~ 2/3 

for all sufficiently large n, where C depends only on the bounds on the sizes of the spectra 
of A n and B n . 

Proof. This follows from Theorem 13.81 using t = n _2//3 and the Borel-Cantelli lemma. □ 
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